WIP: Plan CodeQL sandbox recipe refactor by pruiz · Pull Request #45 · pruiz/CodeCome

pruiz · 2026-06-05T20:58:06Z

Summary

Adds a detailed WIP implementation plan for refactoring the CodeQL integration around sandbox-generated build recipes.

The plan proposes:

reordering Phase 1 to run sandbox bootstrap before CodeQL;
adding itemdb/notes/sandbox-recipe.yml as the machine-readable contract between sandbox and CodeQL;
keeping codeql-plan.yml as the CodeQL analysis-intent document;
supporting multi-unit CodeQL through sandbox build targets rather than multiple sandboxes;
adding CodeQL run health classification to avoid false successful runs;
moving to per-run CodeQL artifact directories to prevent stale SARIF/normalized output reuse;
updating prompts, gates, docs, tests, and subphase names consistently.

Scope

This PR is intentionally documentation-only for review. No implementation code is changed yet.

Review focus

Please review especially:

The new Phase 1 order.
The proposed sandbox-recipe.yml schema.
The relationship between codeql-plan.yml analysis units and sandbox build targets.
The CodeQL health/usability rules.
The implementation sequence and acceptance criteria.

Summary by CodeRabbit

New Features
- Introduced Phase 1 sub-stage restructuring: Phase 1a (Profile), Phase 1b (Sandbox Bootstrap), Phase 1c (Detailed Reconnaissance)
- Added sandbox recipe as a machine-readable contract for sandboxes
- Implemented CodeQL health classification model to determine result usability
- Added Docker-based CodeQL execution support with platform compatibility checks
- New sandbox bootstrap CLI commands for recipe validation and output
Documentation
- Updated workflow documentation with new three-stage Phase 1 breakdown
- Added sandbox recipe schema and configuration documentation
- Updated prompts to reflect restructured phase responsibilities and outcomes
- Added example template files for sandbox recipe configuration
Tests
- Added comprehensive test suites for health classification and sandbox recipes
- Extended validation tests for CodeQL plan schema v2 and empty language lists
- Added tests for Docker platform detection
Chores
- Updated .gitignore to exclude token usage artifacts
- Updated Makefile help text for phase labeling

coderabbitai · 2026-06-05T20:58:12Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: fdec1ef5-8f2d-4174-84c7-37ddf371a8c6

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

✅ Review completed - (🔄 Check again to review again)

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch wip/codeql-sandbox-recipe-refactor

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-05T20:59:50Z

Coverage Report

Metric	Value
Line Coverage	76.3%
Lines Covered	0 / 0

Download detailed HTML coverage reports per OS/Python from the workflow artifacts.

Generated by pytest-cov on 2026-06-07T19:00:47.934Z

Add tools/sandbox/recipe.py with load_recipe, validate_recipe, dump_recipe. Add recipe-validate and recipe-print subcommands to sandbox-bootstrap.py. Add templates/sandbox-recipe.yml.example (schema v1). Add tests/test_sandbox_recipe.py (24 tests covering valid/invalid schemas). Update docs/sandbox.md and templates/sandboxes/README.md with recipe docs. Update .project/codeql-sandbox-recipe-refactor-plan.md with locked plan.

tools/codeql/packs.py: load_codeql_plan requires schema_version: 2, v1 raises PackResolverError with clear upgrade guidance. templates/codeql-plan.yml: updated to v2 schema with build_provider, sandbox_build_target, and new commentary. tools/phases/phase_1_gates.py: gate 1a now calls load_codeql_plan() so v1 plans are rejected at the gate with the upgrade error message. prompts/phase-1a-profile.md: updated to v2 rules, including sandbox_build_target, build_provider, and new language about sandbox being built in Phase 1b. Tests: all plan fixtures bumped to v2, new test_schema_v1_rejected_at_gate_1a.

Rename prompts/phase-1c-sandbox.md -> prompts/phase-1b-sandbox.md. Update content to reflect new position (second sub-stage instead of third). Add sandbox-recipe.yml as a first-class required output alongside sandbox-plan.md. Add recipe schema rules, per-target guidance, and validation instructions. Update test_phase_1c_reads_threat_model to use the renamed prompt file.

tools/codeql/health.py: compute_health() with classifications covering disabled, skipped, failed, extraction-failed, analysis-failed, completed-empty-valid, completed-with-signals, and more. tools/codeql/pipeline.py: generate run_id (UTC timestamp + hash), create per-run directory under itemdb/codeql/runs/<id>/, write last-run-manifest.yml and current-run.txt, compute health before importing risk signals. tools/codeql/runner.py: accept optional run_dir parameter; write SARIF, databases, and selected-query-packs.yml into per-run dir. tools/codeql/artifacts.py: prefer last-run-manifest.yml, fall back to run-manifest.yml; consume health block for usability gating. tests/test_codeql_health.py: 11 tests covering all health classifications. tests/test_codeql_pipeline.py: updated for per-run layout. tests/test_codeql_artifacts.py: updated for manifest fallback.

tools/codeql/platform.py: host_platform() and container_platform() for detecting OS/arch mismatch between host and sandbox container. tools/codeql/in_docker.py: check_platform() and exec_codeql() for running CodeQL inside a Docker Compose service. Platform guard classifies units as unavailable when host (e.g. Darwin) and container (e.g. Linux) platforms differ under mount-host-bundle strategy. templates/sandboxes/_shared/codeql.sh: wrapper script invoked by the harness, resolving compose file + service from env or defaults. templates/sandboxes/*/docker-compose.yml: added read-only bind mount of host .tools/codeql/current at /opt/codeql inside every container. tests/test_codeql_platform.py: 7 tests for platform detection. tests/test_codeql_in_docker.py: 6 tests for Docker executor.

… repair Phase 1 orchestra tor reordered: 1a: Target Profile → gate check_phase_1a 1b: Sandbox Bootstrap → gate check_phase_1c CodeQL runs after sandbox (post-sandbox) 1c: Detailed Reconnaissance → gate check_phase_1b Removed _run_codeql_repair_if_needed and all CodeQL repair helpers. Removed codeql-plan auto-repair from subphase retry loop. Removed build_codeql_plan_resume_prompt and build_codeql_build_failure_resume_prompt from completion.py. Stopped referencing prompts/phase-1-codeql-repair.md (file kept in tree until cleanup). Renamed prompts/phase-1b-recon.md → prompts/phase-1c-recon.md. Updated gate messages and test references. Skipped 17 tests pending deletion/update in commits 7-8.

prompts/phase-1c-recon.md: updated to Phase 1c, added health-aware CodeQL artifact reading rules (read last-run-manifest.yml, skip signals when health.usable is false, record health summary in threat-model.md). docs/workflow.md: rewritten Phase 1 section with the new 1a→1b→1c order. Phase 1a produces target profile + build model + codeql plan. Phase 1b produces sandbox + sandbox-recipe.yml. Phase 1c produces full recon notes enriched with CodeQL when usable. Makefile help text: Sandbox bootstrap (Phase 1c) → (Phase 1b). .opencode/skills/sandbox-bootstrap/SKILL.md and .opencode/agents/recon.md already reference Phase 1b as sandbox bootstrap — aligned with new order.

Delete prompts/phase-1-codeql-repair.md and tests/test_phase_1_codeql_plan_repair.py. Remove three skipped repair-related tests from test_codecome_check_codeql.py. Full test suite: 730 passed, 0 failures.

Copilot

Pull request overview

This PR advances the planned “CodeQL + sandbox recipe refactor” by introducing the sandbox recipe contract + validator, reordering Phase 1 (sandbox bootstrap before CodeQL, detailed recon after), and reshaping CodeQL execution artifacts around per-run directories with health classification.

Changes:

Add itemdb/notes/sandbox-recipe.yml schema example plus Python loader/validator and CLI helpers (recipe-validate, recipe-print).
Reorder Phase 1 to 1a → 1b (sandbox) → CodeQL → 1c (recon) and update gates/prompts/docs/tests accordingly.
Introduce per-run CodeQL output layout (runs/<run-id>/...), last-run-manifest.yml, current-run.txt, and a health model used by artifact gating and recon prompts.

Reviewed changes

Copilot reviewed 54 out of 55 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
tools/sandbox/recipe.py	New loader/validator for `sandbox-recipe.yml`.
tools/sandbox/init.py	Establish `sandbox` as an importable package.
tools/sandbox-bootstrap.py	Add `recipe-validate` / `recipe-print` subcommands and constants for recipe path.
tools/phases/phase_1_gates.py	Switch CodeQL plan loading to shared loader; update Phase 1 messaging for new order.
tools/phases/completion.py	Remove CodeQL plan/build failure resume prompt helpers tied to the old repair flow.
tools/codeql/runner.py	Add optional `run_dir` support to write artifacts per-run (legacy mode preserved).
tools/codeql/platform.py	New host/container platform detection helpers.
tools/codeql/pipeline.py	Create per-run directories, write `last-run-manifest.yml`/`current-run.txt`, compute health, gate risk import.
tools/codeql/packs.py	Enforce CodeQL plan `schema_version: 2` with actionable upgrade error.
tools/codeql/in_docker.py	New helpers for CodeQL execution inside Docker Compose services + platform guard.
tools/codeql/health.py	New health classification model for “usable” vs “unusable” CodeQL runs.
tools/codeql/artifacts.py	Gate artifacts based on `health` when present; support last-run manifest.
tools/codecome/phase_1.py	Reorder Phase 1 execution; remove prior CodeQL repair loop and associated validation logic.
tests/test_sandbox_recipe.py	New unit tests for recipe load/validation rules.
tests/test_phase_failure_state_reset.py	Update prompt path usage consistent with new Phase 1 prompt naming.
tests/test_phase_1_prompts_threat_model.py	Update prompt existence/content checks for renamed Phase 1 prompts.
tests/test_phase_1_mid_turn_forgiveness.py	Update prompt path usage consistent with new Phase 1 prompt naming.
tests/test_phase_1_gates.py	Update CodeQL plan fixtures to schema v2; add test that v1 is rejected.
tests/test_phase_1_codeql_plan_repair.py	Remove tests for deleted CodeQL repair flow.
tests/test_codeql_runner.py	Update CodeQL plan fixtures to schema v2 where needed.
tests/test_codeql_platform.py	New tests for platform detection utilities.
tests/test_codeql_pipeline.py	Update pipeline tests for per-run layout + last-run manifest behavior.
tests/test_codeql_packs.py	Update pack/plan fixtures for CodeQL plan schema v2.
tests/test_codeql_in_docker.py	New tests for Docker execution helpers and platform guard behavior.
tests/test_codeql_health.py	New tests for health classification outcomes.
tests/test_codeql_artifacts.py	Update assertions to match new “manifest missing” wording.
tests/test_codecome_check_codeql.py	Update plan schema fixture (v2) and remove repair-related tests.
templates/sandboxes/web-static/docker-compose.yml	Mount host CodeQL bundle into sandbox container.
templates/sandboxes/rust/docker-compose.yml	Mount host CodeQL bundle into sandbox container.
templates/sandboxes/ruby/docker-compose.yml	Mount host CodeQL bundle into sandbox container.
templates/sandboxes/python/docker-compose.yml	Mount host CodeQL bundle into sandbox container.
templates/sandboxes/php/docker-compose.yml	Mount host CodeQL bundle into sandbox container.
templates/sandboxes/node/docker-compose.yml	Mount host CodeQL bundle into sandbox container.
templates/sandboxes/nested-virt/docker-compose.yml	Mount host CodeQL bundle into sandbox container.
templates/sandboxes/multi-service-compose/docker-compose.yml	Mount host CodeQL bundle into sandbox container.
templates/sandboxes/java-maven/docker-compose.yml	Mount host CodeQL bundle into sandbox container.
templates/sandboxes/iac-terraform/docker-compose.yml	Mount host CodeQL bundle into sandbox container.
templates/sandboxes/go/docker-compose.yml	Mount host CodeQL bundle into sandbox container.
templates/sandboxes/generic/docker-compose.yml	Mount host CodeQL bundle into sandbox container.
templates/sandboxes/erlang-otp/docker-compose.yml	Mount host CodeQL bundle into sandbox container.
templates/sandboxes/dotnet/docker-compose.yml	Mount host CodeQL bundle into sandbox container.
templates/sandboxes/c-cpp/docker-compose.yml	Mount host CodeQL bundle into sandbox container.
templates/sandboxes/_shared/codeql.sh	New shared wrapper for invoking CodeQL via `docker compose exec`.
templates/sandboxes/README.md	Document the new sandbox recipe artifact and CLI helpers.
templates/sandbox-recipe.yml.example	Add example schema for the sandbox recipe contract.
templates/codeql-plan.yml	Bump CodeQL plan template to schema v2 and document sandbox mapping fields.
prompts/phase-1c-recon.md	Update recon prompt to Phase 1c and use health-aware artifact consumption rules.
prompts/phase-1b-sandbox.md	Update sandbox bootstrap prompt to Phase 1b and require `sandbox-recipe.yml`.
prompts/phase-1a-profile.md	Update Phase 1a prompt rules for CodeQL plan schema v2.
prompts/phase-1-codeql-repair.md	Remove obsolete repair prompt (no longer used).
Makefile	Update help text to reflect sandbox bootstrap as Phase 1b.
docs/workflow.md	Update workflow docs for 3-substage Phase 1 and new outputs.
docs/sandbox.md	Document sandbox recipe artifact and how to validate/print it.
.project/codeql-sandbox-recipe-refactor-plan.md	Add/extend detailed implementation plan document.
.gitignore	Ignore `token-usage-output.txt`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+Branch purpose: define the target architecture and implementation plan before changing runtime code.
+
+## 1. Problem statement


+def _generate_run_id() -> str:
+    ts = datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ")
+    fingerprint = hashlib.sha256(ts.encode()).hexdigest()[:8]
+    return f"{ts}-{fingerprint}"


    out.separator(tone=T.SUCCESS)
-    out.success("Ready to run Phase 1c (Sandbox Bootstrap).")
+    out.success("Ready to run Phase 1c (Detailed Reconnaissance).")
    return 0


+def test_host_platform_known_os() -> None:
+    plat = host_platform()
+    assert any(kw in plat.lower() for kw in ("linux", "darwin", "windows"))
+


+            sandbox_path = Path(root) / sandbox_path_str
+            if not sandbox_path.exists():
+                errors.append(
+                    f"sandbox-recipe.yml: sandbox.path {sandbox_path_str!r} does not exist"
+                )


+    manifest.setdefault("health", health)
+


+        result = subprocess.run(
+            ["uname", "-sm"], capture_output=True, text=True, timeout=10
+        )
+        return result.stdout.strip()


+            ],
+            capture_output=True, text=True, timeout=timeout,
+        )
+        return result.stdout.strip() or result.stderr.strip()


+            f"CodeQL bundle is for {host_plat}; sandbox service "
+            f"{service!r} runs {container_plat}. "
+            "install_strategy=mount-host-bundle cannot cross platforms. "
+            "Use install_strategy=download-in-container or image-preinstalled "
+            "(not yet supported)."


tests/test_prompts.py: add test_no_stale_phase_1b_1c_references_in_prompts that greps all prompt files for contradictory Phase 1b/1c patterns, plus self-consistency tests for phase-1b-sandbox.md and phase-1c-recon.md. prompts/README.md: update filenames and descriptions to new 1b/1c order. prompts/sweep.md: add 'CodeQL signals (conditional)' section instructing the sweep to check health.usable before importing CodeQL signals. prompts/phase-6-report.md: add health-aware rule in 'Reporting rules' section instructing the reporter to check last-run-manifest.yml health.

coderabbitai

Actionable comments posted: 13

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

prompts/phase-1b-sandbox.md (1)
138-138: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix sub-stage reference in retry-exhaustion instruction.

This line says “stop Phase 1c” inside the Phase 1b prompt. It should refer to stopping Phase 1b (or stopping Phase 1 progression) to avoid contradictory operator guidance.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@prompts/phase-1b-sandbox.md` at line 138, Update the incorrect sub-stage
reference in the retry-exhaustion instruction so it no longer tells the operator
to "stop Phase 1c"; change the phrase to "stop Phase 1b" (or "stop Phase 1
progression") in the sentence that mentions retry budget and logging to
sandbox-plan.md and references CODECOME_BOOTSTRAP_MAX_RETRIES, ensuring the
instruction consistently refers to Phase 1b.
prompts/phase-1c-recon.md (1)
178-180: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix stale CodeQL artifact path and stale Phase filename in Phase 1c prompt.

This prompt still contains pre-refactor references that conflict with the new Phase 1c/per-run model: Line 178 points to itemdb/codeql/normalized/file-signals.yml instead of the per-run location, and Line 274 still writes phase-1b-summary during Phase 1c. Both can cause incorrect file reads/writes by the agent.
🛠️ Proposed doc fix
-If CodeQL file signals exist (`itemdb/codeql/normalized/file-signals.yml`), incorporate them:
+If CodeQL file signals exist (`itemdb/codeql/runs/<run-id>/normalized/file-signals.yml`), incorporate them:
@@
-    runs/phase-1b-summary-YYYY-MM-DD-HHMMSS.md
+    runs/phase-1c-summary-YYYY-MM-DD-HHMMSS.md
Also applies to: 274-274
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@prompts/phase-1c-recon.md` around lines 178 - 180, Update the stale CodeQL
artifact path and the Phase filename in the prompt: replace references to the
legacy static path "itemdb/codeql/normalized/file-signals.yml" with the per-run
CodeQL artifact location used by the new Phase 1c model (the per-run path your
pipeline writes for CodeQL file signals), and change the summary output filename
from "phase-1b-summary" to "phase-1c-summary" so the agent reads/writes the
correct per-run artifacts; search for those exact strings in the prompt text and
update them accordingly.
tests/test_codeql_pipeline.py (1)
212-253: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Duplicate test name overrides the earlier test case.

Line 253 redefines the same function name from Line 212, so the first test is never executed.
Suggested fix
-def test_pipeline_normalize_failure_marks_soft_failed_for_soft_policy(tmp_path: Path) -> None:
+def test_pipeline_normalize_failure_sets_expected_last_manifest_for_soft_policy(tmp_path: Path) -> None:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_codeql_pipeline.py` around lines 212 - 253, The file defines
test_pipeline_normalize_failure_marks_soft_failed_for_soft_policy twice, causing
the first definition to be overridden; rename the duplicate test function to a
unique, descriptive name (for example change the second occurrence to
test_pipeline_normalize_failure_marks_soft_failed_for_soft_policy_duplicate or a
name reflecting its intent) so both tests are distinct, and update any
references or fixtures if needed to match the new function name (look for the
symbol test_pipeline_normalize_failure_marks_soft_failed_for_soft_policy in this
diff to locate both definitions).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.project/codeql-sandbox-recipe-refactor-plan.md:
- Around line 857-859: The doc uses a non-canonical key `Sandbox_build_target`
(capital S) instead of the agreed canonical `sandbox_build_target`, causing
copy/paste drift; update the text at the occurrence of `Sandbox_build_target` to
use `sandbox_build_target` so it matches the schema and examples (search for the
token `Sandbox_build_target` and replace with `sandbox_build_target` in the
description of Multi-target invocation).

In `@Makefile`:
- Line 109: You changed the Makefile (the printf line that prints "Sandbox
bootstrap (Phase 1b)"), which is disallowed; revert that modification so the
Makefile remains exactly as in main (remove that hunk from your commit or reset
the file), then amend the PR to exclude any Makefile/orchestration/config edits;
reference the changed printf line to locate and undo the change and ensure only
allowed source files are included in the branch before pushing.

In `@prompts/phase-1a-profile.md`:
- Around line 68-73: The doc contradicts itself about requiring build_command:
clarify that when build_provider is "sandbox-recipe"
analysis_units[].sandbox_build_target should point to the sandbox-recipe.yml
build_targets[].id and build_command must be left empty even if build_mode is
"manual"; update the guidance so that build_command is only required when
build_provider is "none" or when build_mode is "manual" and build_provider is
not "sandbox-recipe" (i.e., for manual, non-recipe builds supply build_command),
and state that codeql-plan.yml should never include a concrete build_command for
units using sandbox-recipe (use sandbox_build_target instead).

In `@prompts/sweep.md`:
- Around line 24-27: Update the prompt to use per-run CodeQL paths: read the run
id from itemdb/codeql/last-run-manifest.yml (as before) and then reference
enrichment files under itemdb/codeql/runs/<run_id>/normalized/alerts.yml and
itemdb/codeql/runs/<run_id>/normalized/file-signals.yml instead of the old
top-level itemdb/codeql/normalized/... paths; ensure any mention of
"normalized/alerts.yml" or "file-signals.yml" in this prompt is replaced with
the per-run path pattern using the manifest's run_id.

In `@tests/test_phase_1_prompts_threat_model.py`:
- Around line 64-66: The test test_phase_1c_reads_threat_model is reading the
wrong fixture ("phase-1b-sandbox.md"); change the call to _read_prompt to open
the Phase 1c recon prompt (e.g., "phase-1c-recon.md") so the test actually
verifies the Phase 1c recon requirement, leaving the assert ("threat-model.md"
in content) as-is; update the string literal in the test function
test_phase_1c_reads_threat_model to the correct prompt filename.

In `@tests/test_prompts.py`:
- Around line 30-53: The regex checks in stale_patterns can miss matches
spanning multiple lines because re.search is called without DOTALL; update the
search to allow '.' to match newlines (e.g., use re.DOTALL / re.S) so patterns
like "Phase 1b.*Detailed Reconnaissance" will match across line breaks. Locate
the loop that iterates prompts_dir.glob("*.md") and the call to
re.search(pattern, content, re.IGNORECASE) and change the flags to re.IGNORECASE
| re.DOTALL (or compile the patterns with re.S) while preserving the existing
exceptions logic and pattern strings.

In `@tests/test_sandbox_recipe.py`:
- Around line 81-85: Replace the use of the bare `assert False, "Expected
ValueError"` in the exception tests that call load_recipe(path) with an explicit
exception raise to avoid removal under optimized execution; specifically, change
those failure branches to `raise AssertionError("Expected ValueError")` (and
apply the same replacement for the similar block around the second exception
test) so the test still fails clearly when no ValueError is raised.

In `@tools/codeql/health.py`:
- Around line 183-191: The current logic returns "skipped" whenever
has_languages is false even if status == "failed", hiding real failures; update
the conditional so that failures are preserved—either move the status ==
"failed" block above the has_languages check or change the has_languages branch
to only return "skipped" when status != "failed"; make sure to still use db_ok
and analyze_ok (the existing variables) to return the appropriate "failed"
message ("CodeQL database creation failed.", "CodeQL analysis failed.", or
"CodeQL pipeline failed.").

In `@tools/codeql/in_docker.py`:
- Around line 25-37: The guard only treats "mount-host-bundle" as a host-bundle
case but misses "copy-host-bundle", so update the conditional that checks
install_strategy to include "copy-host-bundle" (e.g., install_strategy not in
("mount-host-bundle", "copy-host-bundle")) and ensure the subsequent validation
uses host_platform(), container_platform(service, compose_file), and
platforms_compatible() to reject cross-platform combinations with an error
message that mentions both mount-host-bundle and copy-host-bundle as unsupported
for cross-platform installs.

In `@tools/codeql/pipeline.py`:
- Around line 16-19: The _generate_run_id function currently derives fingerprint
solely from ts, causing identical run IDs for starts within the same second;
update _generate_run_id to add non-deterministic entropy (e.g., include
microseconds, process id, or a random value such as uuid.uuid4().hex or
secrets.token_hex) into the hashed input or as part of the fingerprint so each
invocation is unique; keep the return format f"{ts}-{fingerprint}" and update
the fingerprint calculation in _generate_run_id to hash timestamp plus the
chosen random/entropy source (or replace hashing with a truncated uuid4) to
avoid collisions.

In `@tools/codeql/platform.py`:
- Around line 15-20: The probe currently returns result.stdout.strip() even when
the command fails; change the subprocess.run invocation handling in the probe
(the call that uses ["uname", "-sm"] and the similar probe at lines 31-40) to
treat non-zero exit codes as failures: after subprocess.run(...) check
result.returncode and if it's non-zero (or result.stdout is empty/unreliable)
return "unknown" instead of returning stdout/stderr; update the code paths
around the subprocess.run call and the existing return result.stdout.strip() so
both probes consistently return "unknown" on non-zero exits or empty output.

In `@tools/sandbox-bootstrap.py`:
- Around line 1562-1606: The cmd_recipe_validate and cmd_recipe_print functions
implement business logic in the top-level tools/sandbox-bootstrap.py; per policy
this file must be a thin delegating wrapper—move the implementation into a
package module (e.g., create tools.sandbox.recipe_commands or similar) and keep
these functions as one-line delegators that import and call the package
functions (e.g., recipe_commands.validate_recipe_cmd and
recipe_commands.print_recipe_cmd) passing through args; ensure all imports and
heavy logic (load_recipe, validate_recipe, dump_recipe, _emit, error handling
and printing) live in the new package module and update the wrapper functions
cmd_recipe_validate and cmd_recipe_print to simply import and return the package
function results.

In `@tools/sandbox/recipe.py`:
- Around line 175-192: The helper _validate_codeql_hints currently skips
validating the top-level codeql key default_execution_mode; add a validation
block (similar to the preferred_mode check) that reads
codeql.get("default_execution_mode"), ensures it's a string and one of
VALID_EXECUTION_MODES, and appends an error with the same format
(f"sandbox-recipe.yml: {prefix}.default_execution_mode {value!r} invalid
(allowed: {valid})") when invalid so validate_recipe() will catch bad
default_execution_mode values.

---

Outside diff comments:
In `@prompts/phase-1b-sandbox.md`:
- Line 138: Update the incorrect sub-stage reference in the retry-exhaustion
instruction so it no longer tells the operator to "stop Phase 1c"; change the
phrase to "stop Phase 1b" (or "stop Phase 1 progression") in the sentence that
mentions retry budget and logging to sandbox-plan.md and references
CODECOME_BOOTSTRAP_MAX_RETRIES, ensuring the instruction consistently refers to
Phase 1b.

In `@prompts/phase-1c-recon.md`:
- Around line 178-180: Update the stale CodeQL artifact path and the Phase
filename in the prompt: replace references to the legacy static path
"itemdb/codeql/normalized/file-signals.yml" with the per-run CodeQL artifact
location used by the new Phase 1c model (the per-run path your pipeline writes
for CodeQL file signals), and change the summary output filename from
"phase-1b-summary" to "phase-1c-summary" so the agent reads/writes the correct
per-run artifacts; search for those exact strings in the prompt text and update
them accordingly.

In `@tests/test_codeql_pipeline.py`:
- Around line 212-253: The file defines
test_pipeline_normalize_failure_marks_soft_failed_for_soft_policy twice, causing
the first definition to be overridden; rename the duplicate test function to a
unique, descriptive name (for example change the second occurrence to
test_pipeline_normalize_failure_marks_soft_failed_for_soft_policy_duplicate or a
name reflecting its intent) so both tests are distinct, and update any
references or fixtures if needed to match the new function name (look for the
symbol test_pipeline_normalize_failure_marks_soft_failed_for_soft_policy in this
diff to locate both definitions).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 1f7f88ac-4287-42f3-9df6-7c4db6983e79

📥 Commits

Reviewing files that changed from the base of the PR and between 3a10252 and b7505f4.

📒 Files selected for processing (59)

.gitignore
.project/codeql-sandbox-recipe-refactor-plan.md
Makefile
docs/sandbox.md
docs/workflow.md
prompts/README.md
prompts/phase-1-codeql-repair.md
prompts/phase-1a-profile.md
prompts/phase-1b-sandbox.md
prompts/phase-1c-recon.md
prompts/phase-6-report.md
prompts/sweep.md
templates/codeql-plan.yml
templates/sandbox-recipe.yml.example
templates/sandboxes/README.md
templates/sandboxes/_shared/codeql.sh
templates/sandboxes/c-cpp/docker-compose.yml
templates/sandboxes/dotnet/docker-compose.yml
templates/sandboxes/erlang-otp/docker-compose.yml
templates/sandboxes/generic/docker-compose.yml
templates/sandboxes/go/docker-compose.yml
templates/sandboxes/iac-terraform/docker-compose.yml
templates/sandboxes/java-maven/docker-compose.yml
templates/sandboxes/multi-service-compose/docker-compose.yml
templates/sandboxes/nested-virt/docker-compose.yml
templates/sandboxes/node/docker-compose.yml
templates/sandboxes/php/docker-compose.yml
templates/sandboxes/python/docker-compose.yml
templates/sandboxes/ruby/docker-compose.yml
templates/sandboxes/rust/docker-compose.yml
templates/sandboxes/web-static/docker-compose.yml
tests/test_codecome_check_codeql.py
tests/test_codeql_artifacts.py
tests/test_codeql_health.py
tests/test_codeql_in_docker.py
tests/test_codeql_packs.py
tests/test_codeql_pipeline.py
tests/test_codeql_platform.py
tests/test_codeql_runner.py
tests/test_phase_1_codeql_plan_repair.py
tests/test_phase_1_gates.py
tests/test_phase_1_mid_turn_forgiveness.py
tests/test_phase_1_prompts_threat_model.py
tests/test_phase_failure_state_reset.py
tests/test_prompts.py
tests/test_sandbox_recipe.py
tools/codecome/phase_1.py
tools/codeql/artifacts.py
tools/codeql/health.py
tools/codeql/in_docker.py
tools/codeql/packs.py
tools/codeql/pipeline.py
tools/codeql/platform.py
tools/codeql/runner.py
tools/phases/completion.py
tools/phases/phase_1_gates.py
tools/sandbox-bootstrap.py
tools/sandbox/__init__.py
tools/sandbox/recipe.py

💤 Files with no reviewable changes (3)

prompts/phase-1-codeql-repair.md
tests/test_phase_1_codeql_plan_repair.py
tools/phases/completion.py

coderabbitai · 2026-06-06T21:22:45Z

+7. **Multi-target invocation**: the runner resolves `Sandbox_build_target` from
+   the recipe per (analysis unit, language) pair and invokes CodeQL with the
+   resolved `build_command`. Identical commands across targets are allowed;


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use the canonical key name sandbox_build_target consistently.

Line 857 uses Sandbox_build_target (capital S), but the schema examples and surrounding contract use sandbox_build_target. This can cause implementation drift from copy/paste.

✏️ Proposed doc fix

-7. **Multi-target invocation**: the runner resolves `Sandbox_build_target` from +7. **Multi-target invocation**: the runner resolves `sandbox_build_target` from

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

7. **Multi-target invocation**: the runner resolves `Sandbox_build_target` from

the recipe per (analysis unit, language) pair and invokes CodeQL with the

resolved `build_command`. Identical commands across targets are allowed;

7. **Multi-target invocation**: the runner resolves `sandbox_build_target` from

the recipe per (analysis unit, language) pair and invokes CodeQL with the

resolved `build_command`. Identical commands across targets are allowed;

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.project/codeql-sandbox-recipe-refactor-plan.md around lines 857 - 859, The doc uses a non-canonical key `Sandbox_build_target` (capital S) instead of the agreed canonical `sandbox_build_target`, causing copy/paste drift; update the text at the occurrence of `Sandbox_build_target` to use `sandbox_build_target` so it matches the schema and examples (search for the token `Sandbox_build_target` and replace with `sandbox_build_target` in the description of Multi-target invocation).

coderabbitai · 2026-06-06T21:22:45Z

 	@printf "    $(BOLD)make sandbox-test$(RESET)   Test the target inside the sandbox\n"
 	@printf "\n"
-	@printf "  $(BOLD)$(CYAN)Sandbox bootstrap (Phase 1c):$(RESET)\n"
+	@printf "  $(BOLD)$(CYAN)Sandbox bootstrap (Phase 1b):$(RESET)\n"


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not modify Makefile without explicit instruction.

This PR changes Makefile, which violates the repository rule forbidding orchestration/config-file edits unless explicitly requested.

As per coding guidelines, "{codecome.yml,AGENTS.md,Makefile,.opencode/**}: Never modify project orchestration or configuration files ... unless explicitly instructed".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Makefile` at line 109, You changed the Makefile (the printf line that prints "Sandbox bootstrap (Phase 1b)"), which is disallowed; revert that modification so the Makefile remains exactly as in main (remove that hunk from your commit or reset the file), then amend the PR to exclude any Makefile/orchestration/config edits; reference the changed printf line to locate and undo the change and ensure only allowed source files are included in the branch before pushing.

Source: Coding guidelines

coderabbitai · 2026-06-06T21:22:45Z

+If `itemdb/codeql/last-run-manifest.yml` exists, read its `health` block.
+When `health.usable` is `true`, use `itemdb/codeql/normalized/alerts.yml`
+and `file-signals.yml` as enrichment for the target file. When
+`health.usable` is `false`, the CodeQL run did not produce trustworthy


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

CodeQL signal paths are outdated for per-run layout.

After this refactor, normalized outputs are per-run (itemdb/codeql/runs/<run_id>/normalized/...), not top-level itemdb/codeql/normalized/.... This prompt can point sweep to missing/stale files.

Suggested fix

-If `itemdb/codeql/last-run-manifest.yml` exists, read its `health` block. -When `health.usable` is `true`, use `itemdb/codeql/normalized/alerts.yml` -and `file-signals.yml` as enrichment for the target file. When +If `itemdb/codeql/last-run-manifest.yml` exists, read its `health` block and `run_id`. +When `health.usable` is `true`, read: +- `itemdb/codeql/runs/<run_id>/normalized/alerts.yml` +- `itemdb/codeql/runs/<run_id>/normalized/file-signals.yml` +as enrichment for the target file. When `health.usable` is `false`, the CodeQL run did not produce trustworthy output — do not import its signals.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@prompts/sweep.md` around lines 24 - 27, Update the prompt to use per-run CodeQL paths: read the run id from itemdb/codeql/last-run-manifest.yml (as before) and then reference enrichment files under itemdb/codeql/runs/<run_id>/normalized/alerts.yml and itemdb/codeql/runs/<run_id>/normalized/file-signals.yml instead of the old top-level itemdb/codeql/normalized/... paths; ensure any mention of "normalized/alerts.yml" or "file-signals.yml" in this prompt is replaced with the per-run path pattern using the manifest's run_id.

coderabbitai · 2026-06-06T21:22:45Z

 def test_phase_1c_reads_threat_model() -> None:
-    content = _read_prompt("phase-1c-sandbox.md")
+    content = _read_prompt("phase-1b-sandbox.md")
    assert "threat-model.md" in content


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Test target file mismatches the test intent.

test_phase_1c_reads_threat_model currently reads phase-1b-sandbox.md, so it no longer verifies the Phase 1c recon prompt requirement it is named for.

✅ Proposed fix

def test_phase_1c_reads_threat_model() -> None: - content = _read_prompt("phase-1b-sandbox.md") + content = _read_prompt("phase-1c-recon.md") assert "threat-model.md" in content

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_phase_1_prompts_threat_model.py` around lines 64 - 66, The test test_phase_1c_reads_threat_model is reading the wrong fixture ("phase-1b-sandbox.md"); change the call to _read_prompt to open the Phase 1c recon prompt (e.g., "phase-1c-recon.md") so the test actually verifies the Phase 1c recon requirement, leaving the assert ("threat-model.md" in content) as-is; update the string literal in the test function test_phase_1c_reads_threat_model to the correct prompt filename.

coderabbitai · 2026-06-06T21:22:45Z

+    if install_strategy not in ("mount-host-bundle",):
+        return True, ""
+
+    host_plat = host_platform()
+    container_plat = container_platform(service, compose_file)
+
+    if not platforms_compatible(host_plat, container_plat):
+        return False, (
+            f"CodeQL bundle is for {host_plat}; sandbox service "
+            f"{service!r} runs {container_plat}. "
+            "install_strategy=mount-host-bundle cannot cross platforms. "
+            "Use install_strategy=download-in-container or image-preinstalled "
+            "(not yet supported)."


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

copy-host-bundle bypasses platform compatibility checks.

Line 25 only guards mount-host-bundle, but copy-host-bundle also carries a host-built CodeQL bundle and can fail the same way on cross-platform host/container pairs. This should be validated identically.

🔧 Proposed fix

- if install_strategy not in ("mount-host-bundle",): + if install_strategy not in ("mount-host-bundle", "copy-host-bundle"): return True, "" @@ - "install_strategy=mount-host-bundle cannot cross platforms. " - "Use install_strategy=download-in-container or image-preinstalled " - "(not yet supported)." + f"install_strategy={install_strategy} cannot cross platforms. " + "Use install_strategy=image-preinstalled."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/codeql/in_docker.py` around lines 25 - 37, The guard only treats "mount-host-bundle" as a host-bundle case but misses "copy-host-bundle", so update the conditional that checks install_strategy to include "copy-host-bundle" (e.g., install_strategy not in ("mount-host-bundle", "copy-host-bundle")) and ensure the subsequent validation uses host_platform(), container_platform(service, compose_file), and platforms_compatible() to reject cross-platform combinations with an error message that mentions both mount-host-bundle and copy-host-bundle as unsupported for cross-platform installs.

coderabbitai · 2026-06-06T21:22:45Z

+def _generate_run_id() -> str:
+    ts = datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ")
+    fingerprint = hashlib.sha256(ts.encode()).hexdigest()[:8]
+    return f"{ts}-{fingerprint}"


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Run ID generation can collide for runs started in the same second.

Line 18 hashes only ts, so it’s deterministic per second. Two starts within one second produce identical run_id, breaking per-run isolation.

Suggested fix

import hashlib +import uuid @@ def _generate_run_id() -> str: ts = datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ") - fingerprint = hashlib.sha256(ts.encode()).hexdigest()[:8] + fingerprint = uuid.uuid4().hex[:8] return f"{ts}-{fingerprint}"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/codeql/pipeline.py` around lines 16 - 19, The _generate_run_id function currently derives fingerprint solely from ts, causing identical run IDs for starts within the same second; update _generate_run_id to add non-deterministic entropy (e.g., include microseconds, process id, or a random value such as uuid.uuid4().hex or secrets.token_hex) into the hashed input or as part of the fingerprint so each invocation is unique; keep the return format f"{ts}-{fingerprint}" and update the fingerprint calculation in _generate_run_id to hash timestamp plus the chosen random/entropy source (or replace hashing with a truncated uuid4) to avoid collisions.

coderabbitai · 2026-06-06T21:22:45Z

+        result = subprocess.run(
+            ["uname", "-sm"], capture_output=True, text=True, timeout=10
+        )
+        return result.stdout.strip()
+    except Exception:
+        return "unknown"


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle non-zero probe exits as unknown instead of returning arbitrary stderr text.

Both probes only fall back on exceptions, but subprocess.run(...) with default check=False does not raise on command failure. A failed probe can therefore return empty/diagnostic text and be treated as a real platform string, causing false incompatibility in downstream gating.

Suggested fix

def host_platform() -> str: @@ - result = subprocess.run( + result = subprocess.run( ["uname", "-sm"], capture_output=True, text=True, timeout=10 ) - return result.stdout.strip() + if result.returncode != 0: + return "unknown" + value = result.stdout.strip() + return value or "unknown" except Exception: return "unknown" @@ - result = subprocess.run( + result = subprocess.run( [ "docker", "compose", "-f", str(compose_file), "exec", "-T", service, "uname", "-sm", ], capture_output=True, text=True, timeout=timeout, ) - return result.stdout.strip() or result.stderr.strip() + if result.returncode != 0: + return "unknown" + value = result.stdout.strip() + return value or "unknown" except Exception: return "unknown"

Also applies to: 31-40

🧰 Tools

🪛 Ruff (0.15.15)

[error] 16-16: Starting a process with a partial executable path

(S607)

[warning] 19-19: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/codeql/platform.py` around lines 15 - 20, The probe currently returns result.stdout.strip() even when the command fails; change the subprocess.run invocation handling in the probe (the call that uses ["uname", "-sm"] and the similar probe at lines 31-40) to treat non-zero exit codes as failures: after subprocess.run(...) check result.returncode and if it's non-zero (or result.stdout is empty/unreliable) return "unknown" instead of returning stdout/stderr; update the code paths around the subprocess.run call and the existing return result.stdout.strip() so both probes consistently return "unknown" on non-zero exits or empty output.

coderabbitai · 2026-06-06T21:22:45Z

+def cmd_recipe_validate(args: argparse.Namespace) -> int:
+    path = Path(args.path) if hasattr(args, "path") and args.path else SANDBOX_RECIPE_PATH
+    if not path.is_file():
+        print(C.fail(f"Sandbox recipe not found at {path}"), file=sys.stderr)
+        return 1
+
+    try:
+        from sandbox.recipe import load_recipe, validate_recipe
+        recipe = load_recipe(path)
+    except Exception as exc:
+        print(C.fail(f"Failed to load recipe: {exc}"), file=sys.stderr)
+        return 1
+
+    errors = validate_recipe(recipe, root=str(ROOT))
+    if errors:
+        print(C.fail(f"Sandbox recipe at {path} has {len(errors)} validation error(s):"), file=sys.stderr)
+        for err in errors:
+            print(f"  {C.SYM_BULLET} {err}")
+        return 1
+
+    print(C.ok(f"Sandbox recipe at {path} is valid."))
+    return 0
+
+
+def cmd_recipe_print(args: argparse.Namespace) -> int:
+    path = Path(args.path) if hasattr(args, "path") and args.path else SANDBOX_RECIPE_PATH
+    if not path.is_file():
+        print(C.fail(f"Sandbox recipe not found at {path}"), file=sys.stderr)
+        return 1
+
+    try:
+        from sandbox.recipe import load_recipe
+        recipe = load_recipe(path)
+    except Exception as exc:
+        print(C.fail(f"Failed to load recipe: {exc}"), file=sys.stderr)
+        return 1
+
+    if args.format == "json":
+        _emit(recipe, "json")
+    else:
+        from sandbox.recipe import dump_recipe
+        print(dump_recipe(recipe).rstrip())
+
+    return 0
+


🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Keep tools/sandbox-bootstrap.py as delegation-only; move recipe command logic into a package module.

These new command bodies add implementation in a root tools/*.py script. Per repo policy, this file should stay a thin wrapper and delegate to implementation under a package (e.g., tools/sandbox/...).

As per coding guidelines: “Standalone scripts at the tools/ root … should be thin wrappers that delegate to their respective packages. Implementation must live in the package, not the script.”

🧰 Tools

🪛 Ruff (0.15.15)

[warning] 1571-1571: Do not catch blind exception: Exception

(BLE001)

[warning] 1595-1595: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/sandbox-bootstrap.py` around lines 1562 - 1606, The cmd_recipe_validate and cmd_recipe_print functions implement business logic in the top-level tools/sandbox-bootstrap.py; per policy this file must be a thin delegating wrapper—move the implementation into a package module (e.g., create tools.sandbox.recipe_commands or similar) and keep these functions as one-line delegators that import and call the package functions (e.g., recipe_commands.validate_recipe_cmd and recipe_commands.print_recipe_cmd) passing through args; ensure all imports and heavy logic (load_recipe, validate_recipe, dump_recipe, _emit, error handling and printing) live in the new package module and update the wrapper functions cmd_recipe_validate and cmd_recipe_print to simply import and return the package function results.

Source: Coding guidelines

coderabbitai · 2026-06-06T21:22:45Z

+def _validate_codeql_hints(codeql: dict[str, Any], prefix: str) -> list[str]:
+    errors: list[str] = []
+
+    install_strategy = codeql.get("install_strategy")
+    if install_strategy is not None:
+        if not isinstance(install_strategy, str) or install_strategy not in VALID_INSTALL_STRATEGIES:
+            valid = ", ".join(sorted(VALID_INSTALL_STRATEGIES))
+            errors.append(
+                f"sandbox-recipe.yml: {prefix}.install_strategy {install_strategy!r} invalid (allowed: {valid})"
+            )
+
+    preferred_mode = codeql.get("preferred_execution_mode")
+    if preferred_mode is not None:
+        if not isinstance(preferred_mode, str) or preferred_mode not in VALID_EXECUTION_MODES:
+            valid = ", ".join(sorted(VALID_EXECUTION_MODES))
+            errors.append(
+                f"sandbox-recipe.yml: {prefix}.preferred_execution_mode {preferred_mode!r} invalid (allowed: {valid})"
+            )


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Top-level codeql.default_execution_mode is currently unvalidated.

validate_recipe() passes top-level codeql into _validate_codeql_hints(), but this helper only validates preferred_execution_mode. That leaves invalid codeql.default_execution_mode values undetected even though the example schema uses that key.

🔧 Proposed fix

def _validate_codeql_hints(codeql: dict[str, Any], prefix: str) -> list[str]: errors: list[str] = [] @@ - preferred_mode = codeql.get("preferred_execution_mode") - if preferred_mode is not None: - if not isinstance(preferred_mode, str) or preferred_mode not in VALID_EXECUTION_MODES: + # Build-target hints use preferred_execution_mode; top-level codeql uses default_execution_mode. + mode_key = "preferred_execution_mode" if "preferred_execution_mode" in codeql else "default_execution_mode" + mode_value = codeql.get(mode_key) + if mode_value is not None: + if not isinstance(mode_value, str) or mode_value not in VALID_EXECUTION_MODES: valid = ", ".join(sorted(VALID_EXECUTION_MODES)) errors.append( - f"sandbox-recipe.yml: {prefix}.preferred_execution_mode {preferred_mode!r} invalid (allowed: {valid})" + f"sandbox-recipe.yml: {prefix}.{mode_key} {mode_value!r} invalid (allowed: {valid})" )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/sandbox/recipe.py` around lines 175 - 192, The helper _validate_codeql_hints currently skips validating the top-level codeql key default_execution_mode; add a validation block (similar to the preferred_mode check) that reads codeql.get("default_execution_mode"), ensures it's a string and one of VALID_EXECUTION_MODES, and appends an error with the same format (f"sandbox-recipe.yml: {prefix}.default_execution_mode {value!r} invalid (allowed: {valid})") when invalid so validate_recipe() will catch bad default_execution_mode values.

- Wire up CodeQL execution inside Docker Compose when preferred_execution_mode is docker-inside. - Untwist the Phase 1b and Phase 1c gate checks to match their corresponding phases. - Fix extraction health check false positive where a missing stat would pass. - Relax threat-model.md heading artifact check to be case-insensitive.

Copilot

Pull request overview

Copilot reviewed 62 out of 63 changed files in this pull request and generated 7 comments.

+        # workdir must be absolute inside the sandbox
+        workdir = target.get("workdir")
+        if not isinstance(workdir, str) or not workdir:
+            errors.append(
+                f"sandbox-recipe.yml: build_target {target_id!r} missing or empty 'workdir'"
+            )
+        elif not workdir.startswith("/"):
+            errors.append(
+                f"sandbox-recipe.yml: build_target {target_id!r} workdir {workdir!r} must be absolute (e.g. /workspace/src)"
+            )
+
+        # codeql hints
+        codeql = target.get("codeql")
+        if isinstance(codeql, dict):
+            errors.extend(_validate_codeql_hints(codeql, prefix=f"build_targets[{i}].codeql"))


+    cmd = [
+        "database", "create",
+        str(db_dir).replace(str(ROOT), workspace_root),
+        "-l", language_id,
+        "-s", str(ROOT / source_path).replace(str(ROOT), workspace_root),
+        "--overwrite",
+        "--no-run-unnecessary-builds",
+    ]
+
+    if cache_dir:
+        cmd.append(f"--codescanning-config={str(cache_dir).replace(str(ROOT), workspace_root)}") # dummy cache config wait no
+    # let's rebuild cmd cleanly


+    recipe_path = ROOT / "itemdb" / "notes" / "sandbox-recipe.yml"
+    sandbox_recipe = None
+    if recipe_path.exists():
+        try:
+            from sandbox.recipe import load_recipe
+            sandbox_recipe = load_recipe(recipe_path)
+        except Exception:
+            pass


+def _generate_run_id() -> str:
+    ts = datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ")
+    fingerprint = hashlib.sha256(ts.encode()).hexdigest()[:8]
+    return f"{ts}-{fingerprint}"


+    assert last_mani["status"] == "failed"
+
+
+def test_pipeline_normalize_failure_marks_soft_failed_for_soft_policy(tmp_path: Path) -> None:


+        return False, (
+            f"CodeQL bundle is for {host_plat}; sandbox service "
+            f"{service!r} runs {container_plat}. "
+            "install_strategy=mount-host-bundle cannot cross platforms. "
+            "Use install_strategy=download-in-container or image-preinstalled "
+            "(not yet supported)."


+Status: **Locked plan — implementation in progress**
+
+Branch purpose: define the target architecture and implementation plan before changing runtime code.
+


coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

tools/codeql/in_docker.py (1)
26-41: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Platform check should also validate copy-host-bundle strategy.

Line 26 only guards mount-host-bundle, but copy-host-bundle also carries a host-built CodeQL bundle and can fail on cross-platform host/container pairs.
🔧 Proposed fix
-    if install_strategy not in ("mount-host-bundle",):
+    if install_strategy not in ("mount-host-bundle", "copy-host-bundle"):
         return True, ""
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/codeql/in_docker.py` around lines 26 - 41, The platform compatibility
guard currently only checks install_strategy for "mount-host-bundle"; extend
that check to include "copy-host-bundle" so both host-built bundle strategies
run the host/container platform compatibility logic: update the conditional that
reads install_strategy not in ("mount-host-bundle",) to include
"copy-host-bundle", ensuring the existing calls to host_platform(),
container_platform(service, compose_file) and platforms_compatible(host_plat,
container_plat) are used unchanged and the same False return with the
explanatory message is triggered for either strategy when platforms mismatch.
tools/codeql/health.py (1)
183-191: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Failed runs are still misclassified as skipped when languages are empty.

Line 183 executes before line 186, so an explicit status == "failed" with no resolved languages is reclassified to "skipped". This masks real failures in downstream gating, exactly as flagged in the previous review.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/codeql/health.py` around lines 183 - 191, The logic currently returns
"skipped" when has_languages is false even if status == "failed"; change the
check order or condition so that a failed status isn't masked: either move the
status == "failed" block (the checks referencing db_ok and analyze_ok) before
the has_languages check, or change the first condition to if not has_languages
and status != "failed" to preserve real failures; update references to
has_languages, status, db_ok, and analyze_ok accordingly.

🧹 Nitpick comments (3)

tools/codeql/runner.py (2)
76-83: ⚡ Quick win

Consider logging recipe load failures.

The bare except: pass silently suppresses all errors when loading the sandbox recipe. This makes debugging difficult when the recipe exists but has validation errors or import failures.
♻️ Proposed refactor
     try:
         from sandbox.recipe import load_recipe
         sandbox_recipe = load_recipe(recipe_path)
-    except Exception:
-        pass
+    except Exception as exc:
+        _progress(progress, f"CodeQL: sandbox recipe not loaded: {exc}")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/codeql/runner.py` around lines 76 - 83, The current try/except around
loading the sandbox recipe (recipe_path, sandbox_recipe, load_recipe from
sandbox.recipe) swallows all errors; change it to catch Exception as e and log
the failure with the exception details before continuing so failures
(validation/import) are visible; use the module logger (or import logging and
getLogger) to call logger.exception or logger.error with the exception info and
a clear message including recipe_path and keep sandbox_recipe as None on
failure.
380-389: 💤 Low value

Prefix unused rc variable with underscore.

Lines 380 and 428 unpack the return code from exec_codeql but never use it. Prefix with _rc to indicate it's intentionally unused.
♻️ Proposed fix
-            ok, msg, rc = exec_codeql(
+            ok, msg, _rc = exec_codeql(
Apply the same fix at both line 380 and line 428.
Also applies to: 426-437
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/codeql/runner.py` around lines 380 - 389, Unpack the unused return code
from exec_codeql with a prefixed underscore to mark it intentionally unused:
change the tuple assignments that currently read "ok, msg, rc =
exec_codeql(...)" to "ok, msg, _rc = exec_codeql(...)" in both places where
exec_codeql is called (the block around the first occurrence unpacking ok, msg,
rc and the second occurrence in the other exec_codeql call within the 426-437
region); keep the returned ok and msg handling unchanged.
tools/codecome/phase_1.py (1)
154-164: ⚡ Quick win

The repair guidance assumes one specific failure mode.

The static guidance text assumes Gate 1a failure is always due to empty languages lists, but check_phase_1a can fail for many other reasons (invalid paths, unsupported build_mode, duplicate unit IDs, etc.). While the actual gate error messages will appear in the console output before this repair prompt, the hardcoded instructions could confuse the AI if the real issue is different.

Consider passing the actual gate failure context (e.g., captured from check_phase_1a output) instead of static text, or making the guidance more general.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/codecome/phase_1.py` around lines 154 - 164, _phas
e_1a_codeql_plan_repair_output currently returns a static repair message that
only addresses empty languages; change it to accept or incorporate the actual
Gate 1a failure context produced by check_phase_1a (or else produce a more
general repair message) so the AI receives accurate guidance: update the
function _phase_1a_codeql_plan_repair_output (or create a helper) to take an
error string/enum from check_phase_1a and either interpolate that failure text
into the repair prompt or emit a generic set of steps covering invalid paths,
unsupported build_mode, duplicate unit IDs, and empty languages, rather than the
current single-mode instruction.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tools/codeql/runner.py`:
- Around line 331-355: Remove the dead initial command construction that builds
cmd and appends the stale cache flag (the block that defines cmd then
conditionally appends "--codescanning-config=...") because it is immediately
overwritten; instead keep the single clean construction that uses
str(docker_ctx["binary"]) if is_docker else str(binary) and then call
_add_common_caches(cmd, cache_dir) and perform the ROOT->workspace_root
replacement when is_docker; update/remove the misleading comments ("let's
rebuild cmd cleanly" / "dummy cache config wait no") and ensure references to
db_dir, language_id, source_path, ROOT, workspace_root, is_docker, docker_ctx,
binary, and _add_common_caches remain in the final code.

---

Duplicate comments:
In `@tools/codeql/health.py`:
- Around line 183-191: The logic currently returns "skipped" when has_languages
is false even if status == "failed"; change the check order or condition so that
a failed status isn't masked: either move the status == "failed" block (the
checks referencing db_ok and analyze_ok) before the has_languages check, or
change the first condition to if not has_languages and status != "failed" to
preserve real failures; update references to has_languages, status, db_ok, and
analyze_ok accordingly.

In `@tools/codeql/in_docker.py`:
- Around line 26-41: The platform compatibility guard currently only checks
install_strategy for "mount-host-bundle"; extend that check to include
"copy-host-bundle" so both host-built bundle strategies run the host/container
platform compatibility logic: update the conditional that reads install_strategy
not in ("mount-host-bundle",) to include "copy-host-bundle", ensuring the
existing calls to host_platform(), container_platform(service, compose_file) and
platforms_compatible(host_plat, container_plat) are used unchanged and the same
False return with the explanatory message is triggered for either strategy when
platforms mismatch.

---

Nitpick comments:
In `@tools/codecome/phase_1.py`:
- Around line 154-164: _phas e_1a_codeql_plan_repair_output currently returns a
static repair message that only addresses empty languages; change it to accept
or incorporate the actual Gate 1a failure context produced by check_phase_1a (or
else produce a more general repair message) so the AI receives accurate
guidance: update the function _phase_1a_codeql_plan_repair_output (or create a
helper) to take an error string/enum from check_phase_1a and either interpolate
that failure text into the repair prompt or emit a generic set of steps covering
invalid paths, unsupported build_mode, duplicate unit IDs, and empty languages,
rather than the current single-mode instruction.

In `@tools/codeql/runner.py`:
- Around line 76-83: The current try/except around loading the sandbox recipe
(recipe_path, sandbox_recipe, load_recipe from sandbox.recipe) swallows all
errors; change it to catch Exception as e and log the failure with the exception
details before continuing so failures (validation/import) are visible; use the
module logger (or import logging and getLogger) to call logger.exception or
logger.error with the exception info and a clear message including recipe_path
and keep sandbox_recipe as None on failure.
- Around line 380-389: Unpack the unused return code from exec_codeql with a
prefixed underscore to mark it intentionally unused: change the tuple
assignments that currently read "ok, msg, rc = exec_codeql(...)" to "ok, msg,
_rc = exec_codeql(...)" in both places where exec_codeql is called (the block
around the first occurrence unpacking ok, msg, rc and the second occurrence in
the other exec_codeql call within the 426-437 region); keep the returned ok and
msg handling unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: dee9eb3e-1dfc-4229-aeca-b23dc6631486

📥 Commits

Reviewing files that changed from the base of the PR and between b7505f4 and 7f47242.

📒 Files selected for processing (16)

prompts/phase-1a-profile.md
templates/codeql-plan.yml
tests/test_codeql_packs.py
tests/test_codeql_runner.py
tests/test_phase_1_gates.py
tests/test_phase_1_gates_threat_model.py
tests/test_phase_completion_threat_model.py
tools/codecome/phase_1.py
tools/codeql/capabilities.py
tools/codeql/health.py
tools/codeql/in_docker.py
tools/codeql/packs.py
tools/codeql/runner.py
tools/phases/artifact_checks.py
tools/phases/completion.py
tools/phases/phase_1_gates.py

💤 Files with no reviewable changes (1)

tools/codeql/capabilities.py

🚧 Files skipped from review as they are similar to previous changes (1)

prompts/phase-1a-profile.md

coderabbitai · 2026-06-07T19:09:33Z

+    cmd = [
+        "database", "create",
+        str(db_dir).replace(str(ROOT), workspace_root),
+        "-l", language_id,
+        "-s", str(ROOT / source_path).replace(str(ROOT), workspace_root),
+        "--overwrite",
+        "--no-run-unnecessary-builds",
+    ]
+
+    if cache_dir:
+        cmd.append(f"--codescanning-config={str(cache_dir).replace(str(ROOT), workspace_root)}") # dummy cache config wait no
+    # let's rebuild cmd cleanly
    cmd = [
-        str(binary), "database", "create",
-        str(db_dir),
+        str(docker_ctx["binary"]) if is_docker else str(binary), 
+        "database", "create",
+        str(db_dir).replace(str(ROOT), workspace_root),
        "-l", language_id,
-        "-s", str(ROOT / source_path),
+        "-s", str(ROOT / source_path).replace(str(ROOT), workspace_root),
        "--overwrite",
        "--no-run-unnecessary-builds",
    ]
+
    _add_common_caches(cmd, cache_dir)
+    if is_docker:
+        cmd = [arg.replace(str(ROOT), workspace_root) for arg in cmd]


🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Remove dead code and clarify command construction.

Lines 331-342 construct a command that is then immediately replaced by lines 343-356. The comment "let's rebuild cmd cleanly" and "dummy cache config wait no" suggest incomplete refactoring. Lines 331-342 appear to be dead code.

♻️ Proposed cleanup

Remove lines 331-342 entirely and keep only the clean construction:

db_dir.parent.mkdir(parents=True, exist_ok=True) is_docker = docker_ctx is not None workspace_root = docker_ctx["workspace_root"] if is_docker else str(ROOT) - cmd = [ - "database", "create", - str(db_dir).replace(str(ROOT), workspace_root), - "-l", language_id, - "-s", str(ROOT / source_path).replace(str(ROOT), workspace_root), - "--overwrite", - "--no-run-unnecessary-builds", - ] - - if cache_dir: - cmd.append(f"--codescanning-config={str(cache_dir).replace(str(ROOT), workspace_root)}") # dummy cache config wait no - # let's rebuild cmd cleanly cmd = [ str(docker_ctx["binary"]) if is_docker else str(binary), "database", "create", str(db_dir).replace(str(ROOT), workspace_root), "-l", language_id, "-s", str(ROOT / source_path).replace(str(ROOT), workspace_root), "--overwrite", "--no-run-unnecessary-builds", ] - _add_common_caches(cmd, cache_dir) if is_docker: cmd = [arg.replace(str(ROOT), workspace_root) for arg in cmd]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

cmd = [

"database", "create",

str(db_dir).replace(str(ROOT), workspace_root),

"-l", language_id,

"-s", str(ROOT / source_path).replace(str(ROOT), workspace_root),

"--overwrite",

"--no-run-unnecessary-builds",

]

if cache_dir:

cmd.append(f"--codescanning-config={str(cache_dir).replace(str(ROOT), workspace_root)}") # dummy cache config wait no

# let's rebuild cmd cleanly

cmd = [

str(binary), "database", "create",

str(db_dir),

str(docker_ctx["binary"]) if is_docker else str(binary),

"database", "create",

str(db_dir).replace(str(ROOT), workspace_root),

"-l", language_id,

"-s", str(ROOT / source_path),

"-s", str(ROOT / source_path).replace(str(ROOT), workspace_root),

"--overwrite",

"--no-run-unnecessary-builds",

]

_add_common_caches(cmd, cache_dir)

if is_docker:

cmd = [arg.replace(str(ROOT), workspace_root) for arg in cmd]

db_dir.parent.mkdir(parents=True, exist_ok=True)

is_docker = docker_ctx is not None

workspace_root = docker_ctx["workspace_root"] if is_docker else str(ROOT)

cmd = [

str(docker_ctx["binary"]) if is_docker else str(binary),

"database", "create",

str(db_dir).replace(str(ROOT), workspace_root),

"-l", language_id,

"-s", str(ROOT / source_path).replace(str(ROOT), workspace_root),

"--overwrite",

"--no-run-unnecessary-builds",

]

_add_common_caches(cmd, cache_dir)

if is_docker:

cmd = [arg.replace(str(ROOT), workspace_root) for arg in cmd]

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/codeql/runner.py` around lines 331 - 355, Remove the dead initial command construction that builds cmd and appends the stale cache flag (the block that defines cmd then conditionally appends "--codescanning-config=...") because it is immediately overwritten; instead keep the single clean construction that uses str(docker_ctx["binary"]) if is_docker else str(binary) and then call _add_common_caches(cmd, cache_dir) and perform the ROOT->workspace_root replacement when is_docker; update/remove the misleading comments ("let's rebuild cmd cleanly" / "dummy cache config wait no") and ensure references to db_dir, language_id, source_path, ROOT, workspace_root, is_docker, docker_ctx, binary, and _add_common_caches remain in the final code.

pruiz closed this Jun 5, 2026

pruiz force-pushed the wip/codeql-sandbox-recipe-refactor branch from 17d5d25 to f140f4b Compare June 5, 2026 21:01

pruiz reopened this Jun 5, 2026

pruiz added 9 commits June 6, 2026 16:27

docs: plan CodeQL sandbox recipe refactor

c145c88

chore: delete obsolete repair files, remove skipped tests

7058e5c

Delete prompts/phase-1-codeql-repair.md and tests/test_phase_1_codeql_plan_repair.py. Remove three skipped repair-related tests from test_codecome_check_codeql.py. Full test suite: 730 passed, 0 failures.

pruiz force-pushed the wip/codeql-sandbox-recipe-refactor branch from 9445216 to 7058e5c Compare June 6, 2026 20:54

pruiz requested a review from Copilot June 6, 2026 20:55

Copilot started reviewing on behalf of pruiz June 6, 2026 20:55 View session

Copilot AI reviewed Jun 6, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 6, 2026

View reviewed changes

pruiz added 3 commits June 6, 2026 23:45

fix(codeql): allow recipe-backed manual builds

4de38ff

fix(codeql): soft-skip empty language units

86756d2

pruiz requested a review from Copilot June 7, 2026 18:59

Copilot started reviewing on behalf of pruiz June 7, 2026 18:59 View session

Copilot AI reviewed Jun 7, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 7, 2026

View reviewed changes

		Branch purpose: define the target architecture and implementation plan before changing runtime code.

		## 1. Problem statement

		assert last_mani["status"] == "failed"


		def test_pipeline_normalize_failure_marks_soft_failed_for_soft_policy(tmp_path: Path) -> None:

		Status: Locked plan — implementation in progress

		Branch purpose: define the target architecture and implementation plan before changing runtime code.

Conversation

pruiz commented Jun 5, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Scope

Review focus

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

github-actions Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pruiz commented Jun 5, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading

github-actions Bot commented Jun 5, 2026 •

edited

Loading